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Abstract 

Improved performance in higher-order spectral density estimation is achieved using 
a general class of infinite-order kernels. These estimates are asymptotically less bi- 
ased but with the same order of variance as compared to the classical estimators with 
second-order kernels. A simple, data-dependent algorithm for selecting bandwidth 
is introduced and is shown to be consistent with estimating the optimal bandwidth. 
Bispectral simulations with several standard models are used to demonstrate the per- 
formance of the proposed methodology. 
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1 Introduction 

Lag-window estimation of the high-order spectra under various assumptions is known 
to be consistent and asymptotically normal Q, S, E3, EH • However, convergence rates 
of the estimators depend on the order, or characteristic exponent, of the lag-window 
used. In general, increasing the order of the lag-window decreases the bias without 
affecting the order of magnitude of the variance, thus producing an estimator with a 
faster convergence rate. Although estimators using lag-windows with large orders yield 
estimates with better mean square error (MSE) rates, they were overlooked and rarely 
used in practice mainly because of two issues. Firstly, in estimating the second-order 
spectral density, lag-windows of order larger than two may yield negative estimates, 
despite the fact that the true spectral density is known to be nonnegative. This problem 
only pertains, if ever, to the second-order spectral density (since higher-order spectra 
are complex- valued) , and is easily remedied by truncating the estimator to zero if it 
does go negative (thus improving the already optimal convergence rates [3] ) ■ Secondly, 
when a lag-window has order larger than necessary, the rate of convergence is still 
optimal, but the multiplicative constant will be suboptimal 0]. The second problem 
is encountered when u sing a poor choice of large order lag-window like the box-shaped 



truncated lag- window 13(], but there are many other alternatives with descent small- 
sample performance. Additionally, when the underlying spectral density is sufficiently 
smooth, this second issue is irrelevant since the lag-window with the largest order 
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performs best. The next section introduces a family of infinite-order lag- windows for 
estimating the spectral density and higher-order spectra. 

The use of infinite-order lag-windows is particularly adept to the estimation of 
higher-order spectra. Under the typical scenario of exponential decay of the autocovari- 
ance function (refer to part (ii) of Theorem 1 within), the MSE rates for estimating the 
second-order spectral density using a lag-window of order 2 and an infinite-order lag- 
window are iV -4 / 5 and (log N)/N respectively. However, when estimating the third- 
order spectral density, or bispectrum, the MSE rates become iV -2 / 3 and (log N)/N 
respectively. The disparity grows stronger with yet higher-order spectra. 

The problem of choosing the best bandwidth still remains. The optimal bandwidth 
typically depends on the unknown spectral density leading to a circular problem- 
estimation of the spectrum requires estimation of the bandwidth which in turn requires 
estimation of the spectrum. There have been many fixes to this problem; see [§[] for a 
survey of several methods. Section [3] introduces a new simple, data-dependent method 
of determining the bandwidth which is shown to converge to the asymptotically ideal 
bandwidth for flat-top lag- windows. An alternative bandwidth selection algorithm is 
also included that is designed for use with second-order lag-windows. This algorithm 
uses the plug-in principle for bandwidth selection but with the flat-top estimators as 
the plug-in pilots. 

Particular attention is given to the bispectrum as it is a key tool in several linearity 
and Gaussianity tests including and [15l |. The general bandwidth selection algo- 
rithm is refined and expanded for the bispectrum. Bispectral simulations compare two 
different flat-top lag-windows estimators of the bispectrum with accompanying band- 
width selection algorithm to the lag-window estimator using the order two "optimal" 
lag-window and plug-in bandwidth selection procedure as described in Rao [161 ] . 




We define the flat-top lag-window estimate in Section 2 and derive its higher-order 
MSE convergence in Theorem Q] under the ideal bandwidth. In Section 3, a bandwidth 
selection algorithm tailored to the flat-top estimate is introduced and is shown to 
automatically adapt to the smoothness of the underlying spectral density and converge 
in probability to the ideal bandwidth. The focus is then shifted to the bispectrum 
in Section 4 where the most general function invariant under the symmetries of the 
bivariate cumulant function is constructed. The bandwidth algorithm is specialized 
for the bispectrum, and a separate bandwidth algorithm for second-order lag-windows 
is included that is based on the plug-in method with flat-top estimators as pilots. 
Simulations of the bispectrum in Section 5 exhibit the strength of the flat-top estimators 
and the bandwidth algorithms. 



2 Asymptotic performance of a general flat-top 
window 

Let xi, X2, ■ ■ ■ , aijv be a realization of an r- vector valued s th -order stationary (real 
valued) time series Xt = {X^ , . . . ,X^)' with (unknown) mean /i = (/i^, . . . ,fi^)'. 
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Consider the s th -order central moment 

C ,,JV,.. ■ ■ ,r a ) = E [(Xlli ~ M <B °) ■ ■ ■ ~ • (1) 

where the right-hand side is independent of the choice of t £ Z. Stationarity allows us to 
write the above moment as function of s— 1 variables, so we define C' a as (n, . . . , t s _i) = 
Cq ... (ti, . . . ,r a _i, 0). For notational convenience, the sequence a±,...,a s will be 
dropped, so C' ai a (t\, ■ ■ ■ ,r s _i) will be denoted simply by C"(r). Also t s will occa- 
sionally be used, for convenience, with the understanding that t s = 0. 
We express the s th -order joint cumulant as 

C ai ,..., as (r 1 ,...,r s _ 1 )= (-ir^-l)!^-^ 

(yx,...,v p ) 

where the sum is over all partitions (z/i, . . . , v p ) of {0, ... , r s _i} and [i Vj = E n^eiy -^"^ 

refer to [?J for another expression of the joint cumulant. The (s th -order) spectral den- 
sity is defined as 

We adopt the usual assumption on C(r) that it be absolutely summable, thus guar- 
anteeing the existence and continuity of the spectral density. A natural estimator of 
C(r) is given by 

C(ti,...,t 8 _i)= Yl (-^(P-I)!^!---^ (2) 

(v 1 ,...,v p ) 

where 

JV— max(i'j) 

^ = iV-max(i/ 7 -)+min(i/,-) ^ II 

It turns out that the second-order and third-order cumulants, those that give rise 
to the spectrum and bispectrum respectively, are precisely the second-order and third- 
order central moments [TJ Therefore, in these cases, we can greatly simplify C(t) to 

iV-7 s 

C(r) = ^EU^ttr 3 -^ (3) 

t=l j=i 

where a = min(0, n, . . . , 7fc_i), 7 = max(0, n, . . . , r fe _i) - a, and x (af) = X/^Li x j 
for ^ = 1, . . . s — 1. We extend the domain of C to Z s by defining C(r) = when then 
sum in ([2]) or ([3]) is empty. 

Consider a flat-top lag- window function A : ]R S_1 — ► R satisfying the following condi- 
tions: 

(i) \{x) = 1 for all x satisfying < b, for some positive number b. 

(ii) |A(x)| < 1 for all s. 
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(iii) For M ->• oo as N -> oo, but with M/AT ->• 0, 



lim — — - A ( — ] < oo. 

\\x\\<N 



(iv) A e L 2 (K S_1 ) 

The window A(x) is a "flat-top" because of condition (i); namely, it is constant in 
a neighborhood of the origin. The constant b in (i) is used below in constructing the 
spectral density estimate. 

Technically, just requiring A just to be bounded could replace criterion (ii), but 
there is no benefit in allowing the window to have values larger than 1. Finally, criteria 
(iii) and (iv) are satisfied if, for example, A has compact support. 

Define \u{t) = \{t/M) and consider the smoothed s th -order periodogram 

/V) = 7^4zt E AM(r)a(r)e— . (4) 

lkl|<JV 

There is an equivalent expression to this estimator in the frequency domain given by 

f(tu) = A M * I ai ,...,a s (w) = / A M (u ~ T)I ai ,...,a a (T) dr 

in 8 - 1 

where Am is the Fourier transform of Am and I ai ,,..a a is the (s— l) th order periodogram; 
namely, 

A m (t) = f \ M (r)e^ T dT 

and 

However, equation @ is computationally simpler, and it is this version that will be 
used throughout the remainder of this article. 

The asymptotic bias convergence rate (and thus the overall MSE convergence rate) 
of the estimator (j4]) with a flat-top lag-window A is superior to traditional estimators 
using second-order lag-windows. The convergence rates of our estimator improve with 
the decay rate of the cumulant function C(r)-the faster the decay to zero, the faster the 
convergence. The following theorem outlines convergence rates under three scenarios: 
when the decay of C(r) is polynomial, exponential, and identically zero after some 
finite time (like an MA(g) process). Throughout, conditions on the time series are 
assumed so that 

var (f( u j) = O • (5) 



N 

This is a very typical assumption and is satisfied under summability conditions of the 



cummulants [l[ or under certain mixing condition assumptions (ill ]. 

Theorem 1. Let {Xt} be an r-vector valued s th -order stationary time series with 
unknown mean /x. Let f(u)) be the estimator as defined in ^ and assume ^ is 
satisfied. 



1 



(i) 



Assume for some k > 1, X^tg 
(2k + s- l)- 1 , then 



sup 



Zs _i ||T"|| fc |C(r)| < oo and M ~ aN c with 
bias 



O I ]\[2k+s-l 



(6) 



and 



MSE(/(w)) = O 



-2k 
]\f2i+s-l 



(ii) 



Assume C(r) decreases geometrically fast, i.e. \C(t)\ < De d W T W , for some pos 
itive constants d and D and M ~ ^4 log N where A > l/(2db), then 

1 



sup 

u;6[— 7r,7r]' 



bias 



{/(«)} 



and 



MSE /(w) = O 



logiV 
iV 



(7) 



(8) 



(raj Assume C(r) = /or 



> q and Zei M &e constant such that bM > g ; t/jen 

1 



sup 

u£ [— ir,n]- 



bias 



{/(")} 



N 



and 



MSE (/(w)) = O 



Remark 1. Equations ([6 
replaced with M fe+S_1 /Ar 



respectively ([7]), remain true with the assumptions on M 
► 0, respectively e M M s ~ 1 /N 0. 

Remark 2. Depending on the constant ^4 in part (ii), the bias in ([7]) may be as small 
as O ((logiV) 5 " 1 /^). 

Remark 3. We do not assume the mean /x of the time series is known. This adds 
an extra term of order 0(M S ~ 1 /N) to the bias; see the proof of Theorem Q] in the 
appendix for further details. 

Remark 4. Traditional estimators using second-order lag-windows have bias conver- 
gence rates of order 0(1 /M 2 ) regardless of the three scenarios listed in Theorem [TJ 
However when the spectral density is smooth enough, like in the case of an ARMA 
process (where C(t) decays exponentially), traditional estimators perform consider- 
ably worse. For example, estimation of the bispectrum of an ARMA process has an 
asymptotic MSE rate of N~ 2 ^ 3 in the traditional case, but an asymptotic MSE of 
(log N)/N using flat-top lag-windows. The distinction is even more profound in esti- 
mating higher-order spectra where the best rate achieved is A r ~ 4 /( 3+,s ) for traditional 
estimators and again (logAQ/A^ using flat-top lag- windows. Even in the worst case of 
polynomial decay, our proposed estimator still beats, or possibly ties with, traditional 
estimators in terms of asymptotic MSE rates. 

The asymptotic analysis in Theorem [T] relies on having the appropriate bandwidth 
M based on the various decay rates C (t) . In the next section we propose an algorithm 
that, for the most part, automatically detects the correct decay rate of C(r) and 
supplies the practitioner with an asymptotically consistent estimate of M. 
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3 A Bandwidth Selection Procedure 



For t £ 7L S 1 , consider the normalized cumulant function 

P(T) = 

with natural estimator 



1/2 



n-=i^(o) ,] 



Let (^j V > 0) denote the set of indices in Z s contained in the half-open s — 1- 
dimensional annulus of inner radius x and outer radius y, i.e. 

B XiV = {t £ Z 5 " 1 : x < || t- || < </}. (9) 

The following algorithm for estimating the bandwidth of a flat-top estimator is a 
multivariate extension of an algorithm proposed in 

Bandwidth Selection Algorithm 

Let fc > be a fixed constant, and ajy be a nondecreasing sequence of 
positive integers tending to infinity such that = o(log N). Let m be the 
smallest number such that 



\p(r)\ < k ^^§^ fOT a11 r £ B^ +aN (10) 

Then let M = rh/b (where b is the "flat-top radius" as defined by condition 
(i) of a flat-top lag- window). 

Remark 5. A norm was not specified in ([9]) and any norm may be used. The sup 
norm, for example, may be preferable to the Euclidean norm in practice since the 
region in becomes rectangular instead of circular. 

Remark 6. The positive constant k is irrelevant in the asymptotic theory, but is 
relevant for finite-sample calculations. In order to determine an appropriate value of c 
for computation, we consider the following approximation 

(p(ro) - p(t q )) ~AA(0,a 2 ). (11) 

This approximation holds under general assumptions of the time series and for any 
fixed ro £ Z 5 " 1 . The variance a 2 does not depend on the choice of To provided To 
is not a "boundary point"; see [3] for more details. Let a be the estimate of a via a 
resampling scheme like the block bootstrap. A approximate pointwise 95% confidence 
bound for p(-) is given by iM=£. Therefore if we let = 5, then k = 2a generates an 
approximate 95% simultaneous confidence bound by Bonferroni's inequality by noting 
that y/log^iV ~ 1.5 for moderately sized N. 
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The bandwidth selected using the above procedure converges precisely to the ideal 
bandwidth in each of the three cases of Theorem[TJ as is proved in the following theorem 
under the two natural assumptions in (|12p and (|13p below 1 . 

Theorem 2. Assume conditions strong enough to ensure that for any fixed n, 

max \p(a + t )-p((T + t)\ = O p ( -L ) (12) 



uniformly in a, and for any M , that may depend on N , the following holds 



X + T) - p{cT + T)l = °* (v (13) 

uniformly in cr. 

(i) Assume C(t) ~ A||r|| _d for some positive constants A and d > 1. Then 

. P jvVM 



(logiV) 1/M 

w/iere A = A l l d / \k l / d b); here A~B means A/B — > 1 in probability, 
(ii) Assume C(t) ~ A£^' r " /or some positive constant A and |£| < 1. T/ien 



M ~ Ai log N 



where A\ = — l/(61og |^|). 



p 

(Hi) Suppose C(t) = when \\t\\ > q, but C(t) 7^ for some t with norm q, then M ~ q/b. 



4 Bispectrum 

Now we will focus on estimating the bispectrum using flat-top lag-windows. The third- 
order cumulant reduces to the third-order central moment with estimator given by 0. 
It is easily seen that the third-order central moment, C(ti,T2), satisfies the following 
symmetry relations: 

C(t 1 ,t 2 ) = C(t 2 ,t 1 ) = C(-t 1 ,t 2 -t 1 ) = C(t 1 -t 2 ,-t 2 ) (14) 

Naturally, we would expect the lag-window function, \(ti,t 2 ), in the estimator to 
posses the same symmetries. So if a lag-window A does not a priori have the symmetries 
as in (|14p . we can construct a symmetrized version given by 

X = g (X(x, y), X(y, x), X(-x, y - x), X(y - x, -x), X(x - y, -y), X(-y, x - y)) (15) 

where g is any symmetric function (of its six variables); for example g could be the 
geometric or arithmetic mean. It is worth noting that the symmetrized version of A 

1 Under general regularity conditions, (fT2)) holds as does the even stronger assumption of VN asymptotic 
normality, and (|13p holds from general theory of extremes of dependent sequences; refer to Q 
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is connected to the theory of group representations of the symmetric group 6*3. As a 
special case, symmetric lag-windows can be constructed from a one-dimensional lag- 
window X(x), namely, 

A = g (A(x), A(y), A(-x), \{y - x), X(x - y), X(-y)) (16) 

and if A(x) is an even function, then (116j) becomes A = h (X(x), X(y), X(y — x)) where 
h is any symmetric function (of its three variables). 

Several choices of lag- windows are considered in [TtJ including the so-called "optimal 
window", A opt , which is in some sense optimal among lag-windows of order 2; see 
Theorem 2 on page 43 of lfj. This lag-window is defined as 14| 

Aopt(Ti,T 2 ) = — rn«/2(a(ri,r 2 )) 

a{T 1 ,T 2 y 

where J 2 is the second-order Bessel function of the first kind, and 

a(x,y) = -^Ly^ 2 -xy + y 2 

Although A op t is optimal among order 2 lag-windows, it is sub-optimal to higher-order 
lag-windows, such as flat-top lag-windows. Also, since A op t is not compactly supported, 
it has the potential of being computationally taxing. 

We detail two simple flat-top lag- windows satisfying the symmetries in (I14D . but 
the supply of examples is limitless by (|15p . The first example is a right pyramidal 
frustum with the hexagonal base \x\ + \y\ + \x — y\ =2. We let c G (0, 1) be the scaling 
parameter that dictates when the frustum becomes flat, that is, the flat-top boundary 
is given by \x\ + \ y\ + \x — y\ = 2c. The equation of this lag-window is given by 

A r pf(Ti,T 2 ) = A rp (ri,r 2 ) - A rp (— , — 

1 — c 1 — c Vcc 



where A p is the equation of the right pyramid with base |x| + \y\ + \x — y\ =2, i.e., 

X? P (x,y) = 



'1 — max(|x|, |y|)) + , — 1 < x, y < or < x, y < 1 

'1 — max(|x + y\, \x — y|)) + , otherwise 



The second flat-top lag-window that we propose is the right conical frustum with 
elliptical base x 2 — xy+y 2 = 1. As in the previous example, there is a scaling parameter 
c S (0, 1), and the lag-window becomes flat in the ellipse x 2 —xy+y 2 = c 2 . The equation 
of this lag- window is given by 



A rc f(n,T 2 ) = A rc (ri,r 2 ) - A rc (—,—") 

1 — c 1 — c Vcc/ 



c c 

where A rc is the equation of the right cone with base x 2 — xy + y 2 = 1, i.e., 

Xrc{x,y) = (1 - \/x 2 -xy + y 2 ) + 

Although in both examples the value for b, as defined in property (i) of the flat-top 
lag-window function, is smaller than the parameter c, the symmetries (|14p permit us 
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Figure 1: Plots of the three lag- windows, A opt , A rp f, and A rpc (with c = 1/2 in the latter two). 

to only consider the region < y < x for which a circular arc of radius c does fit. So 
in the two examples above, we take the value of b to be the parameter c. 

The bandwidth selection algorithm can be refined in the context of the bispectrum. 
The symmetries in (j!4[) allow restriction to the region 



Here is the modified bandwidth selection algorithm for flat-top kernels that is tailored 
to the bispectrum: 

Practical Bandwidth Selection Algorithm for the Bispectrum 
Let k = k\ > if n = 1, otherwise k = /c 2 > 0, and let L be a positive integer 
that is o (log N). Order the points {(n, r 2 ) € 1? | < r 2 < ti}U{(1,0)} with 
the usual lexicographical ordering, so Pi = (1,0), P 2 = (2,1), P3 = (3,1), 
P4 = (3,2), and so forth; in general, P n = where i = I (| + y/2n — 2) J 
and j = n — i (i 2 — 3i) — 2. Let m be the smallest number such that 



Remark 7. Except for the first point, (1,0), this algorithm does not incorporate 
boundary points since the asymptotic variance is larger on the boundary; the first 
point is included as there are no interior points with first coordinate equal to 1. The 
constant k is adjusted to account for the larger variance in the first point by providing 
a separate threshold, k±, for this point. 

Remark 8. As suggested with the general algorithm, a subsampling procedure should 
be used to determine the appropriate constants k\ and fc 2 - However, one should be 
careful when choosing a point to for the approximation pip since high variances at the 
origin and on the boundary tend to cause high variances near the origin and near the 



{(ri,T 2 )GR 2 I < T 2 < Ti} 



(17) 





(18) 



Then let M = (first coordinate of P A ) /&=([(§ + \/2m-2)\) jb. 
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boundary in finite-sample scenarios. Therefore an interior point like (6, 3) (as opposed 
to (2, 1)) should be used in determining k2, and a point like (3,0) (as opposed to (1, 0)) 
should be used in determining k\. 

A modified bandwidth selection procedure is now proposed for use with the sub- 
optimal lag-windows of order 2. In this case, we propose using a bandwidth selection 
procedure based on the usual "solve-the-equation plug- in" approach [gj , but with flat- 
top estimates of the unknown quantities as the plug-in pilots. This will afford faster 
convergence rates of the bandwidth as compared estimates based on second-order pilots 
as well as solve the problem of selecting bandwidths for the pilots. 

The optimal bandwidth at each point in the region (|17p . when using differentiable 
second order kernels, is derived in Jig], and is given by 



M X (UJI,UJ2) 



ttN 



2 



2 



+ 



d 2 X(r 1 ,r 2 ) 



\M\L 3 f(ui)f(u2)f(wi + u 2 ) \ dndn 



7(^1, w 2 ) 



l~l=T2=0 / 



(19) 



Estimates of the spectral density using flat-top lag-windows is discussed above, and 
estimating the partial derivatives of the bispectrum follow similarly. For instance, the 
three second order partial derivatives needed in (|19|) can be estimated by 



o 2 



duJidujj 



/(W1,W2) 



A' 



N 



(20) 



(2tt)= 



^ TiTj \M(T 1 ,T 2 )C(Tl,T 2 )e~ 



hJ 



1,2. 



Tl=-N T2 = ~N 



By mimicking the proof of Theorem [IJ the estimator in (|20p has the same asymp- 
totic performance as the estimator /(u>) in Theorem [T] but under a slightly stronger 
assumption for part (i) that X^tgz 2 ll r ll fc+2 |^(' r )l < 00 • We construct the estimator 
M\ by replacing the unknown / and its derivatives in (fT9|) with flat-top estimates 
producing 



ttN 



\M\l 2 1(^1)1(^2)1(^1 +^2) 



d 2 X(r U T2] 



dr\ dr\ 



T\ =T2 =0 , 



o 2 



o 2 



+ 



o 2 



f(ui,uj 2 ) 



The next theorem provides convergence rates of the plug-in algorithm with flat-top 
pilots. 
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Theorem 3. Assume conditions on p such that H2\) and t!3\) of Theorem^hold true, 
and assume conditions strong enough to ensure 2 

var (jQ;) = O (*,J = 1,2) 

(i) Assume C(r) ~ A||r|| /or some positive constants A and d > s + 2. TTien 




(m^ Assume C(t) ~ v4£ll T H /or some positive constant A and |£| < 1. T/ien 
(m) Suppose C(r) = w/ien ||r|| > g, 6u£ C(r) 7^ /or some r lozi/i norm g, i/ien 

^ = ^( 1+0 >(^))- 

In many cases, the convergence is a significant improvement over the traditional 
plug-in approach with second-order lag-window pilots. For example, the convergence 
of the bandwidth for data from an ARMA process would be M(l + P (iV~ 2 / 9 )) using 
second-order pilots and techniques similar to @, Q], but by using flat-top pilots, the 
convergence improves to Mil + Op ( y^og N/N) ) . 



5 Bispectral Simulations 

The three lag- windows detailed above- A opt , A rp f, and A rc f-are compared by their mean 
square error performance in estimating the bispectrum of four standard time series 
models. Three criteria are used to evaluate the performance of the bispectral estimates. 
The first two criteria are the estimators performance in estimating the bispectrum at 
the two points (0,0) and (2, 1). The bispectrum at the point (0,0) is real-valued, and 
estimates typically have variances significantly larger than estimates at the interior 
point (2,1) (exactly 30-times larger, asymptotically, if the second-order spectrum is 
flat). The bispectrum at the point (2, 1) is complex valued and performance is evaluated 
based on the estimation of the real part, complex part, and absolute value. The third 
criteria of evaluation is a composite evaluation of performance of the estimators over 
a rough grid of six points, standardized appropriately (further details below). The 
simulations are computed with data from the four stationary time series models: iid 
Xi, ARMA(1,1), GARCH(1,1), and bilinear(l,0,l,l). The first two are linear time 
series models whereas the last two nonlinear models. Two sample sizes, = 200 and 
A^ = 2000, are used throughout. Every simulation is repeated over 500 realizations. 

2 Certain mixing condition assumptions guarantee this; see [l~2j for an example. 
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The third criteria of evaluation, the composite evaluation is now described in further 
detail. The symmetries of C as given in (|14p induce the following symmetries in the 
spectral density: 

/(^i, u 2 ) = f{uj 2 , wi) = f(u)i, -u>i - uj 2 ) = f(~ui - w 2 , uj 2 ) = -uj 2 ) 

The above symmetries in combination with the periodicity of / imply that / can be 
determined over the entire plane just by its values in the closed triangle T with vertices 
(0,0), (it, 0), and (2tt/3,2tt/3). So / is estimated at ("2*) — ( n 2 ) equally spaced 
points inside T with coordinates = ( ^2^11 ; "ill J w here i = 1, ...,n — 1 and 



~~ ^ 3n ' 3n y 

j = 1, . . . , n — i — 1 (we take n = 5 in the simulations). 

The estimates at Wj,- are standardized to make them comparable. Since, for (wi, W2) 



inside T, [16|] 



var (/(^c^)) « ^J!^/( Wl )/( W2 )/( Wl + W2 ), 

f(toi,u> 2 ) is standardized by dividing it by ^ f (u>i) f (lu 2 ) f (lu\ +lo 2 ). This leads to the 
composite evaluation of / over a course grid of points by the quantity 



n— 1 n- 



-r(A)^ £ 

i=l i=i 



v / /(-g ) )/(^ ) )/(^ ) +^ ) : 



and the empirical MSE is calculated by averaging err(A) 2 over the 500 realizations. 

In the tables of MSE estimates below, the first two rows are estimates from the 
flat-top lag-windows A rp f and A rc f with the bandwidth derived from the Bandwidth 
Selection Algorithm for the Bispectrum, as described above, with parameters L = 5, 
c = .51, and k determined via the block bootstrap (see Remarks [6] and [8} . The third 
and fourth rows are estimates using the A op t with bandwidths from the plug-in method 
with flat-top pilots (f.p.) and second-order pilots (s.p.) respectively. The first column 
of each table concerns the estimation of the bispectrum at (0, 0), taking absolute values 
if the estimate is complex valued. The next three columns concern the estimation of 
the real part, complex part, and absolute value of the bispectrum, respectively, at the 
point (2, 1). The last column, labeled Tq, concerns the composite evaluation over a 
coarse grid of 6 points. 

Simulations (based on 1000 realizations) were conducted to determine the optimal 
finite-sample bandwidth with minimal MSE (checking up to a bandwidth size of 20). 
In the first three models-IID, ARMA, and GARCH-the optimal bandwidth is 1 un- 
der each evaluation criterion and every lag-window. The estimators with best MSE 
performance in these models were the estimators with the best bandwidth selection 
procedure (the choice of lag- window was somewhat secondary). The bilinear model, 
however, had different optimal bandwidths depending on the evaluation criterion and 
the lag-window. The optimal bandwidths for the bilinear model were incorporated into 
MSE tables by subscripting each value with the best bandwidth followed by the second 
best bandwidth. The optimality of the flat-top lag-window, independent of the band- 
width selection procedure, can be observed in this model as the optimal bandwidths 
are larger than 1. 
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Simulations are also carried out to study the bandwidth selection procedure for the 
bispectrum. Histograms, placed in Appendix[Bl depict the selected bandwidths for each 
model over 500 realizations under five procedures (a)-(e) described below. Procedure 
(a) produces bandwidths for flat-top lag-windows A rp f and A rc f whereas procedures (b) 
through (e) produce bandwidths for A opt . 

(a) Practical bandwidth selection algorithm for the bispectrum of Section |4] 

(b) Plug-in method at the origin with flat-top pilots 3 

(c) Plug-in method at the point (2,1) with flat-top pilots 3 

(d) Plug-in method at the origin with second-order pilots 4 

(e) Plug-in method at the point (2,1) with second-order pilots 4 

The performance of the above bandwidth selections procedures are evaluated by 
computing MSE estimates based on the simulations determining the optimal band- 
width. Since procedure (a) produces a global bandwidth, comparison is not so straight- 
forward in the bilinear case where the optimal bandwidth at the origin is different from 
that of the interior. 

5.1 IID Data 

Identical and independent xf data is generated with a central third moment /13 = 8. 
Therefore the true bispectrum is f(uji,uj2) = t^W ~ -202642. The following tables 
give the empirical MSE calculations of the estimated bispectrum over lengths N = 200 
and N = 2000 based on 500 simulations. 



N = 200 


l/(o,o)| 


Re/(2, 1) 


Im/(2, 1) 


1/(2,1)1 


T e 


Arpf 


0.02796 


0.02061 


3.131e-04 


0.02093 


709.4 


A rc f 


0.02778 


0.02060 


3.314e-04 


0.02094 


709.4 


Aopt (f-P-) 


0.02582 


0.02086 


3.577e-04 


0.02122 


709.8 


A pt (s-p.) 


0.02806 


0.02116 


7.121e-04 


0.02187 


715.5 


N = 2000 


l/(o,o)| 


Re/(2, 1) 


Im/(2, 1) 


1/(2,1)1 


n 


Arpf 


2.887e-03 


2.063e-03 


1.799e-05 


2.081e-03 


71.19 


Arcf 


2.865e-03 


2.064e-03 


1.875e-05 


2.083e-03 


71.22 


Aopt (f-P-) 


2.616e-03 


2.101e-03 


2.085e-05 


2.121e-03 


71.23 


A pt (s-p.) 


3.294e-03 


2.184e-03 


1.039e-04 


2.288e-03 


71.45 



Table 1: MSE estimates based on iid data for N = 200 and N = 2000. 



The flat-top estimators and A op t (f-P-) outperform A op t (f-P-) in every criterion con- 
sidered. For N = 2000, bandwidth procedures (a), (b), and (c) perform extremely well 

3 The pilot estimates were derived from the flat-top lag-windows A rp f and the trapezoidal flat-top window 
[T3I ] . The bandwidths for the pilot estimators are derived from the bandwidth selection algorithm of Section 

El 

4 The Parzen and optimal lag-windows were used as pilots with bandwidths [A'' 1 / 5 J and [^V 1 ^ 6 ] respec- 
tively. 
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(refer to the histograms in Figure [2] in Appendix[B|) producing the optimal bandwidth 
1 over 95% of the time in each case. 



5.2 ARMA Model 

The ARM A (1,1) model 

X t = .5AVi - .5Zt_i + Z t 

is now considered where Zt ~ jV(0, 1). This time series is Gaussian, so both the 
bispectrum and normalized bispectrum are identically zero. 



N = 200 


l/(o,o)| 


Re/(2, 1) 


Im/(2,1) 


1/(2,1)1 


T 6 


A rp f 


6.102e-05 


2.329e-05 


4.468e-06 


2.776e-05 


313.3 


A rc f 


6.760e-05 


2.435e-05 


4.624e-06 


2.897e-05 


316.5 


Aopt (f-P-) 


4.422e-05 


2.172e-05 


5.235e-06 


2.696e-05 


302.8 


Aopt (s.p.) 


1.198e-04 


3.088e-05 


2.982e-05 


6.070e-05 


412.0 


N = 2000 


l/(o,o)| 


Re/(2, 1) 


Im/(2, 1) 


1/(2,1)1 


T 6 


Arpf 


2.997e-06 


2.096e-06 


6.896e-08 


2.165e-06 


24.21 


Arcf 


3.297e-06 


2.137e-06 


7.359e-08 


2.210e-06 


24.59 


Aopt (f-P-) 


3.129e-06 


2.132e-06 


2.796e-07 


2.412e-06 


24.74 


Aopt (s.p.) 


2.142e-05 


4.222e-06 


4.349e-06 


8.571e-06 


33.53 



Table 2: MSE estimates based on arma data for iV = 200 and N = 2000. 



The flat-top estimators and A op t (f.p.) even more significantly outperform A op t (f.p.) 
in this model for every criterion considered. Good performance is mostly attributed 
to good bandwidth selection, but true optimal properties of the flat-top lag-windows 
is present and is addressed for the bilinear model. 

5.3 GARCH Model 

We now consider the GARCH(1,1) model 

(x t = Vhz t 

[ht = oto + aiX^__ 1 + a^ht-x 

where a = (.1, .8, .1) and Zt ~ M(0, 1). The theoretical values of the bispectrum are 
unknown, so they are approximated via simulation over 500 realizations at a length of 
10 5 and averaging the four estimators. 
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N = 200 


l/(o,o)| 


Re/(2, 1) 


Im/(2, 1) 


l/(2,l)| 


T e 


A rp f 


9.752e-04 


5.4o2e-05 


3.92e-05 


9.383e-05 


1111 

113.1 


■\ 

'Vcf 


1.038e-03 


5.800e-05 


4.391e-05 


1.019e-04 


115.1 


Aopt (f-P-) 


6.580e-04 


4.345e-05 


3.l82e-05 


7.527e-05 


110.1 


Aopt (s-p.) 


3.849e-04 


3.488e-05 


5.ll2e-05 


8.600e-05 


125.1 


N = 2000 


l/(o,o)| 


Re/(2,1) 


Im/(2,l) 


1/(2,1)1 


T 6 


Arpf 


2.411e-05 


2.916e-06 


l.555e-06 


4.471e-06 


7.317 


A rc f 


2.682e-05 


3.050e-06 


l.745e-06 


4.795e-06 


7.401 


Aopt (f-P-) 


1.894e-05 


2.528e-06 


l.632e-06 


4.159e-06 


7.026 


Aopt (s-p.) 


5.781e-05 


5.577e-06 


7.577e-06 


1.315e-05 


9.021 



Table 3: MSE estimates based on garch data for N = 200 and N = 2000. 



For N = 200, A op t (s.p.) performed best at the origin, but considerably worse in 
the composite criterion. For the larger N, the flat-top estimators and A op t (bp.) again 
performed significantly better than A op t (s.p.). 



5.4 Bilinear Model 

Finally, we consider the BL(1, 0,1,1) bilinear model [1( 

X t = aX t -i + bX t - 1 Z t - 1 + Z t 



where a = b = .4 and Z± ~ A/"(0, 1). The complete calculations of the bispectrum 
have been worked out in [lfj], however the given equation for the bispectrum does not 
match-up with the simulations. Therefore theoretical values of the bispectrum were 
computed through simulations as done in the GARCH model. The spectral density 
equation provided in lfj is correct and was used. 

Whereas the previous three models had an optimal bandwidth of 1 throughout, the 
optimal bandwidths for the bilinear model is typically much larger and depends on 
the evaluation criterion considered. The subscripted numbers represent the best and 
second best bandwidth for each window (as deduced from simulation). 



N = 200 


l/(o,o)| 


Re/(2, 1) 


Im/(2, 1) 


1/(2,1)1 


n 


Arpf 


5.872 2i3 


5.421e-04 4j5 


1.008e-03i, 2 


1.55e-03 4i 5 


806.8 2 , 5 


A rc f 


5.956 2j3 


6.005e-04 6 ^ 5 


1.073e-03i' 2 


1.673e-03^ )5 


817.0 7i6 


Aopt (f-P-) 


4.401 2 ,i 


4.608e-04 4 ^ 3 


9.654e-04 M 


1.426e-03^ 3 


807.1 5 ^ 4 


A pt (s-p.) 


2.916 2a 


3.926e-04^ 3 


8.623e-04^ 2 


1.255e-03 4 ^ 3 


791.4 5 | 4 


N = 2000 


l/(o,o)| 


Re/(2,1) 


Im/(2,1) 


1/(2,1)1 


T 6 


Arpf 


1.755 4l3 


7.734e-05 4 , 6 


9.867e-05i, 2 


1.76e-04 4 , 6 


71.762,5 


Arcf 


1.891 4 , 3 


7.792e-05 6 j 


1.012e-04^ 2 


1.791e-04 6J 


74.69 6i7 


Aopt (f-P-) 


2.119 4 ,3 


6.282e-05 5 ^ 6 


9.443e-05i' 2 


1.572e-04 5 ^ 4 


71.01 6 j 


A pt (s-p.) 


1.322 4 ,3 


5.123e-05 5 ^ 6 


8.064e-05^ 2 


1.319e-04 5 | 4 


72.83 6 . 7 



Table 4: MSE estimates based on bilinear data for N = 200 and N = 2000. 
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For this model, A op t (s.p.) performs better than the other three, but with decreasing 
margins with increased N. There is significant improvement of the flat-top estimators 
and A op t (f-P-) from iV = 200 to N = 2000 making all the estimators mostly equivalent. 
The particularly good performance of A op t (s.p.) at the origin is due to a fortuitous 
bandwidth selection under sensitive conditions; this is addressed in more detail below. 

There is somewhat of a discontinuity in optimal bandwidths for A rp f under the 
composite criterion as it jumps from a best value of 2 to a second best value of 5. A 
closer look at the MSEs for each bandwidth from 1 to 8 further illustrates this. 



N = 200 


1 


2 


3 


4 


5 


6 


7 


8 


A rp f 


10200 


331 


851 


442 


423 


457 


440 


454 


Kct 


10200 


471 


985 


561 


458 


457 


454 


464 


Aopt 


6000 


730 


459 


422 


418 


426 


438 


454 



Table 5: MSE estimates of T 6 with bandwidths one through ten and N = 200 

We see that the bandwidth 2 is very good for the flat-top lag-windows but very poor 
for A op t- Moreover, bandwidths 1 and 3 are extremely bad for the flat-top lag- window, 
and any bandwidth larger than 3 is mostly equivalent among the estimators. In the 
bandwidth selection procedure only odd integer bandwidths were selected since the last 
step of the procedure generates the bandwidth from dividing an integer by b = c = .51. 
If instead the parameter c = .5 is used, then only even integer bandwidths would be 
produced by the algorithm. 

The bispectrum corresponding to bilinear model resembles a hill peaking at the 
origin [la ]. This causes the choice of bandwidth to be particularly delicate when 
estimating the origin. The following table depicts this delicacy. 



N = 200 


1 


2 


3 


4 


5 


6 


7 


A rp f 


2.062 


1.389 


1.71 


2.879 


4.216 


5.849 


7.22 


Arcf 


2.062 


1.390 


1.864 


3.207 


4.848 


6.502 


8.078 


Aopt 


1.823 


1.445 


2.013 


3.13 


4.448 


5.733 


6.852 



Table 6: MSE estimates at the origin with bandwidths one through seven and N = 200 

We see that selecting any bandwidth besides 2, or possibly 3, leads to a much larger 
mean square error. The bispectrum, however, is much flatter at points away from the 
origin, like the six interior points used in the composite evaluation. This causes the 
bandwidth to be less sensitive to the choice of bandwidth when estimating an interior 
value as seen in Table [5] above. 

The simulations up to this point mostly depict the strength of the bandwidth selec- 
tion procedure, and not the general asymptotic optimality of the flat-top lag-window. 
However, if we consider MSE estimates for a fixed set of bandwidths, as in Table [6j 
the flat-top estimates perform better than A op t which improves with N. The following 
table demonstrates the increased performance at N = 2000. 
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N = 2000 


1 


2 


3 


4 


5 


6 


7 


8 


A rp f 
A rc f 


2.029 
2.029 
1.736 


0.9465 
0.9082 
0.8919 


0.552 
0.5074 
0.5444 


0.4687 
0.4917 
0.5267 


0.6002 
0.6821 
0.6444 


0.8262 
0.9224 
0.8001 


1.029 
1.156 
0.9579 


1.237 
1.346 
1.099 



Table 7: MSE estimates at the origin with bandwidths one through seven and N = 200 



Further illustration of the optimality of the flat-top lag- windows is provided in 13J] 



where second-order spectral density estimation with flat-top lag- windows is addressed. 

5.5 Analysis of Bandwidth Procedures 

Histograms of the bandwidths produced by the procedures are provided below. A 
summary of their performance is tabulated in the following table. 

IID ARMA GARCH Bilinear" 



N 


200 


2000 


200 


2000 


200 


2000 


200 


2000 


(a) 


3.18 


0.792 


1.54 


0.248 


6.36 


0.968 


0.413 


0.182 


(b) 


0.862 


0.276 


0.232 


.050 


2.59 


0.292 


1.63 


0.454 


(c) 


2.71 


0.900 


0.866 


0.142 


4.05 


0.552 


0.633 


0.362 


(d) 


1.45 


3.96 


1.27 


3.22 


1.19 


3.49 


0.185 


0.414 


(e) 


4.66 


12.0 


4.04 


9.36 


4.22 


9.82 


0.0706 


0.0394 



a Bandwidths 5 and 6 were selected as theoretical bandwidths for procedure (a), 
but this is only approximate as the optimal bandwidth varies. True theoretical 
bandwidths can be inferred from Table [4] 

Table 8: MSE of M/M — 1 for bandwidth selection procedures (a)-(e) 



We see that the simple bandwidth selection algorithm is very effective in producing 
accurate bandwidths that are consistent. The bandwidth selection procedure (a) can be 
seen to be quite accurate from the histograms but tends to produce a few relatively large 
bandwidths. This error is compounded when squared error loss is used to evaluate the 
performance. The plug-in method with second-order pilots on the other hand performs 
very poorly and does not even appear consistent. 

Histograms of the five bandwidth selection procedures are provided in Appendix 
iBl The histograms in the first three models show a clear convergence of procedures 
(a) through (c) to the ideal bandwidth 1, whereas the bandwidths from procedures (d) 
and (e) grow with N. The histograms for the bilinear model show a general increase 
in M with N across each procedure. 
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6 Conclusions 



Flat-top kernels in higher-order spectral density estimation is shown to be asymptoti- 
cally superior in terms of MSE to any other finite-order kernel estimators. In addition, 
a very simple bandwidth selection algorithm is included that delivers ideal bandwidths 
tailored to the flat-top estimators. If one chooses not to adopt the infinite-order flat- 
top lag-window, then bandwidth selection via the plug-in method with flat-top pi- 
lots demonstrates greatly increased performance and should be used. Finite-sample 
simulations show these flat-top estimators were comparable with, and in many cases 
outperforming, the popular second-order "optimal" lag-window estimator using the 
plug-in method with second-order pilots for bandwidth selection. Simulations show 
the estimation of the bispectrum is quite sensitive to the choice of bandwidth, and this 
paper delivers the first higher-order accurate bandwidth selection procedures for the 
bispectrum. 

A Technical proofs 

Lemma 1. The expectation of C(t) is 



E 



Proof of Lemma [Q Let y t = x t — fx, then 



E 



N 



N 



N 



N 



N-'y 

t=l 


s 

Yl( X t-a+r 

3=1 


- x^ 


N-'y 

E E 

t=i 


n 

3=1 




N-'y 

E E 

t=i 


j=l 


- v {a,) 


N-'y 

E c (-) + E E 

t=i <5ev 





.(%) 



1-5, 



In the summation above, V denotes the set of all binary s-tuples excluding the s-tuple 
{0, . . . ,0}; V has cardinality 2 s — 1. Let S G V and i be its weight, i.e. £ = J2j=i 
Let us suppose, w.l.o.g., that the first i components of S are 1 and the rest 0. Then 
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the term in the above summation corresponding to this d can be written as 



E 



n y£tr, 

3=1 j=£+l 



N i 



N i 



N 


N 




s 


E- 


■ E E 


n 




Ml = l 


Ul = l 


i=l 


j=e+i 


N 


N 






E- 


■£C(ui,. 


. . ,u e ,t - a + r^+i 


mi=i 


1i£ = l 























The last equality follows from the absolute summability of C(t). Since for every S £ V 
the expectation as above is 0(./V _1 ), the result follows. 

Proof of Theorem [TJ Using Lemma [1] and property (hi) of the lag-window, the 
expectation of f(uj) can be expressed as 

si/Ml-iSpr E ((i-£)c(r) + o(±)) 

V y l|-r||<JV V 7 



1 



(2tt) 5 

The bias of /(u>) is 



tE i 

||T||<jV 



iV 



C(T)A M (T)e-^ + O 



M 



s-1 



TV 



£?[/(«)] - /(*) = -1_ £ ( Am(t ) (l - 1) - l) C(r 

v ; IH|<iv 

1 T E C(r)e— + /MS_1 
||t||>JV 



(27T) 



(27T) 



E (AM(r)-l)C(r)e- 

\\t\\<N 



(2tt) s - 1 N 



l\M(r)C(r)e 



\\t\\<N 



(2tt 



r||>JV 



N 



A 3 



By the assumption on the summability of C(r), | -A3 1 can be bounded as 

1*1 * (2iFT E ICWI £ E M*KWI = 



||r||>JV 



(27r) s - 1 /V fe 



IHI>JV 



N k 
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Also, 

I 4ol < 

(2vr 



M^T^Tx E l7l|C(r)|=o(i) 

II— 1| 7\r \ ' 



Now rewrite A± as 



^1 = 7^=1 E (AM(r)-l)C(r)e- 

IM|<6Af 



+ 7^1 E (A,/(r)-l)C(r)e— 

V ' 6M<||i-||<iV 

Proof of (i). 
Since |A(s)| < 1, 

E |C(r)l * w4f e imivwi-.^) 

Equation © now follows, and thus MSE(/(w)) ~ o(M~ 2k ) + 0(M S ~ 1 /N). 
Proof of (ii). 

We have bias(/(cj)) = A\ + O (M s ~ 1 /iV"'), where under the assumptions of (ii), 

L4i|<7- ^-p Y \C(t)\ < , [ 2)D HhM T e d(bM- M)=0 r- db M 
1 1 - (27T) 5 - 1 ^ (2ir) s - l e dbM ^ V 

Therefore MSE(/(u;)) ~ 0(e~ 2dbM ) + 0(M s ~ 1 /N) is asymptotically minimized when 
M ~ A log TV where A = l/(2db), and © holds for all 4 > l/(2db). 

Proof of (hi). 

We have bias (/(a;)) = A\ + but under the assumptions of (hi), A\ = 

0. Hence the bias and variance are 0(1/N). 

Proof of Theorem [2j Let -r^ be any element of norm rh for which 



(21) 



and let t'~ € Bm,m+i, so that m < ||r^J| < rh + 1, and 



Equations (fTUj) and (fT3j) give 



1/5(^)1 < £;/°P (22) 



^log^iV (23) 
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In part (i), p(r) ~ d , so for any e > 0, we can find tq such that 

A(l-e)\\T\\- d <p(r)<A(l + e)\\T\\- d (24) 
when ||t|| > tq. Similarly, for any e > 0, there exists tq large enough such that 

(1 - e)m- d < \\r\\' d < (1 + e)mr d for all r 6 -£>m,m+i (25) 



when m > r . Putting equations ((2T|), ([22]), ([23]), ((24J), and ((25]) together gives, with 
high probability, 



e) 2 m~ d < cj 1 ^- < A(l + e) 2 m~ d (26) 



up to Op(\/ (log log N)/N), which is negligible as N gets large. Equation (126j) is equiv- 
alent to 

rh A i/d N i/2d ^ 

(1 + e) 2 < fcV^logiV)!^ < (1 - e )2 
with high probability. Therefore 



A l/d N l/2d 

k 1 / d {log Nfl 2d 



The proof of part (ii) is similar. 

Now we prove part (iii). Note that m > q only if 



max \p( T )-p(r)\>k\l 1 ^ (27) 

-rGB„,o+a„ V iV 



but since C(r) = when ||r|| > q, equation (|13h then shows 



max |p( T )|= 0p J!« (28) 
TaB q , q+aN \ v iv y 

since ajy = o(log AT). The probability of ([2?]) and ([28]) happening simultaneously tends 
to zero, hence P{rh > q) — > 0. Now if m < then 



Brh,m+a N \p{r)\ = \p(r)\ + O, 



V 



log log N 



N 



shows that ([TO]) must eventually be violated, hence P(m < q) — > and the result 
follows. 

PROOF of Theorem [3] Parts (ii) and (iii) follow from Theorems [JJ and [2] and the 
(5-method; see [l2| for more details. For part (i), first note that YlreZ^Xio} \\ T \\ a < 00 
if and only if a > s — 1. In order for ^ Tg2s _i ||T|| fc+2 |C(r)| < oo, for some k > 1, d 
must satisfy d — k — 2>s — lord>s + k + l>s + 2. Now the results of Theorem 1 
hold for f LJu uj J in replace of f{oJ\, UJ2) for any positive integer k < d— s — 1, in particular 
for A; = [d — s — 2] . From the proof of Theorem 1, the bias is of order o (l/M fc ) , and 
since the variance is of smaller order, the result now follows from substituting M with 
the rate (N/ log N) l / 2d from Theorem 2 (i). 
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B Histograms 

Below are histograms of the bandwidth selection procedures (a) through (e) based on 
The top row in every Figure corresponds to N = 200 and the bottom row corresponds 
to N = 2000. 
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Figure 2: Histograms based on iid data. 
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Figure 3: Histograms based on arma data. 
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Figure 4: Histograms based on garch data. 
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Figure 5: Histograms based on bilinear data. 
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